Goto

Collaborating Authors

 frozen lake environment


A Reinforcement Learning Method for Environments with Stochastic Variables: Post-Decision Proximal Policy Optimization with Dual Critic Networks

arXiv.org Artificial Intelligence

This paper presents Post-Decision Proximal Policy Optimization (PDPPO), a novel variation of the leading deep reinforcement learning method, Proximal Policy Optimization (PPO). The PDPPO state transition process is divided into two steps: a deterministic step resulting in the post-decision state and a stochastic step leading to the next state. Our approach incorporates post-decision states and dual critics to reduce the problem's dimensionality and enhance the accuracy of value function estimation. Lot-sizing is a mixed integer programming problem for which we exemplify such dynamics. The objective of lot-sizing is to optimize production, delivery fulfillment, and inventory levels in uncertain demand and cost parameters. This paper evaluates the performance of PDPPO across various environments and configurations. Notably, PDPPO with a dual critic architecture achieves nearly double the maximum reward of vanilla PPO in specific scenarios, requiring fewer episode iterations and demonstrating faster and more consistent learning across different initializations. On average, PDPPO outperforms PPO in environments with a stochastic component in the state transition. These results support the benefits of using a post-decision state. Integrating this post-decision state in the value function approximation leads to more informed and efficient learning in high-dimensional and stochastic environments.


Environment Agnostic Goal-Conditioning, A Study of Reward-Free Autonomous Learning

arXiv.org Artificial Intelligence

In this paper we study how transforming regular reinforcement learning environments into goal-conditioned environments can let agents learn to solve tasks autonomously and reward-free. We show that an agent can learn to solve tasks by selecting its own goals in an environment-agnostic way, at training times comparable to externally guided reinforcement learning. Our method is independent of the underlying off-policy learning algorithm. Since our method is environment-agnostic, the agent does not value any goals higher than others, leading to instability in performance for individual goals. However, in our experiments, we show that the average goal success rate improves and stabilizes. An agent trained with this method can be instructed to seek any observations made in the environment, enabling generic training of agents prior to specific use cases.


An Open-source Sim2Real Approach for Sensor-independent Robot Navigation in a Grid

arXiv.org Artificial Intelligence

This paper presents a Sim2Real (Simulation to Reality) approach to bridge the gap between a trained agent in a simulated environment and its real-world implementation in navigating a robot in a similar setting. Specifically, we focus on navigating a quadruped robot in a real-world grid-like environment inspired by the Gymnasium Frozen Lake -- a highly user-friendly and free Application Programming Interface (API) to develop and test Reinforcement Learning (RL) algorithms. We detail the development of a pipeline to transfer motion policies learned in the Frozen Lake simulation to a physical quadruped robot, thus enabling autonomous navigation and obstacle avoidance in a grid without relying on expensive localization and mapping sensors. The work involves training an RL agent in the Frozen Lake environment and utilizing the resulting Q-table to control a 12 Degrees-of-Freedom (DOF) quadruped robot. In addition to detailing the RL implementation, inverse kinematics-based quadruped gaits, and the transfer policy pipeline, we open-source the project on GitHub and include a demonstration video of our Sim2Real transfer approach. This work provides an accessible, straightforward, and low-cost framework for researchers, students, and hobbyists to explore and implement RL-based robot navigation in real-world grid environments.


Gym Tutorial: The Frozen Lake

#artificialintelligence

In this article, we are going to learn how to create and explore the Frozen Lake environment using the Gym library, an open source project created by OpenAI used for reinforcement learning experiments. The Gym library defines a uniform interface for environments what makes the integration between algorithms and environment easier for developers. Among many ready-to-use environments, the default installation includes a text-mode version of the Frozen Lake game, used as example in our last post. The first step to create the game is to import the Gym library and create the environment. The next line calls the method gym.make() to create the Frozen Lake environment and then we call the method env.reset() to put it on its initial state.